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Routing information through networks is a universal phenomenon in both natural and manmade 
complex systems. When each node has full knowledge of the global network connectivity, finding 
short communication paths is merely a matter of distributed computation. However, in many real 
networks nodes communicate efficiently even without such global intelligence. Here we show that 
the peculiar structural characteristics of many complex networks support efficient communication 
without global knowledge. We also describe a general mechanism that explains this connection 
between network structure and function. This mechanism relies on the presence of a metric space 
hidden behind an observable network. Our findings suggest that real networks in nature have un- 
derlying metric spaces that remain undiscovered. Their discovery would have practical applications 
ranging from routing in the Internet and searching social networks, to studying information fiows 
in neural, gene regulatory networks, or signaling pathways. 



I. INTRODUCTION 

Networks are ubiquitous in all domains of science and 
technology, and permeate many aspects of daily human 
life [II m [31 H] , especially upon the rise of the informa- 
tion technology society [5l|5]. Our growing dependence 
on them has inspired a burst of activity in the new field of 
network science, keeping researchers motivated to solve 
the difficult challenges that networks offer. Among these, 
the relation between network structure and function is 
perhaps the most important and fundamental. Transport 
is one of the most common functions of networked sys- 
tems. Examples can be found in many domains: trans- 
port of energy in metabolic networks, of mass in food 
webs, of people in transportation systems, of informa- 
tion in cell signalling processes, or of bytes across the 
Internet. 

In many of these examples, routing -or signalling of 
information propagation paths through a complex net- 
work maze- plays a determinant role in the transport 
properties of the system, in particular in such systems 
as the Internet or airport networks that have transport 
as their primary function. The observed efficiency of 
this routing process in real networks poses an intrigu- 
ing question: how is this efficiency achieved? When each 
element of the system has a full view of the global net- 
work topology, finding short routes to target destinations 
is a well-understood computational process. However, in 
many networks observed in nature, including those in so- 
ciety and biology (signalling pathways, neural networks, 
etc.), nodes efficiently find intended communication tar- 
gets even though they do not possess any global view 
of the system. For example, neural networks would not 
function so well if they could not route specific signals to 
appropriate organs or muscles in the body, although no 
neurone has a full view of global inter-neurone connec- 
tivity in the brain. 

In this work, we identify a general mechanism that 
explains routing conductivity, or navigability of real 
networks based on the concept of similarity between 



nodes [3 [Hi H UHl HH [12]. Specifically, intrinsic char- 
acteristics of nodes define a measure of similarity be- 
tween them, which we abstract as a hidden distance. 
Taken together, hidden distances define a hidden metric 
space for a given network. Our recent work shows that 
these spaces explain the observed structural peculiarities 
of several real networks, in particular social and tech- 
nological ones [13,. Here we show that this underlying 
metric structure can be used to guide the routing pro- 
cess, leading to efficient communication without global 
information in arbitrarily large networks. Our analysis 
reveals that, remarkably, real networks satisfy the topo- 
logical conditions that maximise their navigability within 
this framework. Therefore, hidden metric spaces offer 
explanations of two open problems in complex networks 
science: the communication efficiency networks so often 
exhibit, and their unique structural characteristics. 



II. NODE SIMILARITY AND HIDDEN METRIC 
SPACES 

Our work is inspired by the seminal work of sociologist 
Stanley Milgram on the small world problem. The small 
world paradigm refers to the existence of short chains 
of acquaintances among individuals in societies |14j . At 
Milgram's time, direct proof of such a paradigm was im- 
possible due to the lack of large databases of social con- 
tacts, so Milgram conceived an experiment to analyse 
the small world phenomenon in human social networks. 
Randomly chosen individuals in the United States were 
asked to route a letter to an unknown recipient using only 
friends or acquaintances that, according to their judge- 
ment, seemed most likely to know the intended recipient. 
The outcome of the experiment revealed that, without 
any global network knowledge, letters reached the tar- 
get recipient using, on average, 5.2 intermediate people, 
demonstrating that social acquaintance networks were in- 
deed small worlds. 

The small world property can be easily induced by 
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FIG. 1: How hidden metric spaces influence the structure and function of complex networks. The smaller the 
distance between two nodes in the hidden metric space, the more lilcely they are connected in the observable network topology. 
If node A is close to node B, and B is close to C, then A and C are necessarily close because of the triangle inequality in 
the metric space. Therefore, triangle ABC exists in the network topology with high probability, which explains the strong 
clustering observed in real complex networks. The hidden space also guides the greedy routing process: if node A wants to reach 
node F, it checks the hidden distances between F and its two neighbours B and C. Distance CF (green dashed line) is smaller 
than BF (red dashed line), therefore A forwards information to C. Node C then performs similar calculations and selects its 
neighbour D as the next hop on the path to F. Node D is directly connected to F. The result is path A ^ C ^ D ^ F 
shown by green edges in the observable topology. 



adding a small number of random connections to a "large 
world" network [TS]. More striking is the fact that so- 
cial networks are navigable without global information. 
Indeed, the only information that people used to make 
their routing decisions in Milgram's experiment was a set 
of descriptive attributes of the destined recipient, such as 
place of living and occupation. People then determined 
who among their contacts was "socially closest" to the 
target. The success of the experiment indicates that so- 
cial distances among individuals -even though they may 
be difficult to define mathematically- play a role in shap- 
ing the network architecture and that, at the same time, 
these distances can be used to navigate the network. 
However, it is not clear how this coupling between the 
structure and function of the network leads to efficiency 
of the search process, or what the minimum structural 
requirements are to facilitate such efficiency 

In this work, we show how network navigability de- 
pends on the structural parameters characterising the 
two most prominent and common properties of real com- 
plex networks: (1) scale-free (power-law) node degree dis- 
tributions characterising the heterogeneity in the number 
of connections that different nodes have, and (2) cluster- 
ing, a measure of the number of triangles in the network 



topology. We assume the existence of a hidden metric 
space, an underlying geometric frame that contains all 
nodes of the network, shapes its topology, and guides 
routing decisions, as illustrated in Fig.jl] Nodes are con- 
nected in the observable topology, but a full view of their 
global connectivity is not available at any node. Nodes 
are also positioned in the hidden metric space and identi- 
fied by their co-ordinates in it. Distances between nodes 
in this space abstract their similarity |71I51 1^ [TUl [TTIIT^ . 
These distances inftuence both the observable topology 
and routing function: (1) the smaller the distance be- 
tween two nodes in the hidden space, i.e., the more sim- 
ilar the two nodes, the more likely they are connected in 
the observable topology; (2) nodes also use hidden dis- 
tances to select, as the next hop, the neighbour closest 
to the destination in the hidden space. Klcinberg intro- 
duced the term greedy routing to describe this forwarding 
process [IB]. Greedy routing and its modifications have 
been studied extensively in recent computer science lit- 
erature [iTllIHlliglEolEllEaEslElllMlE^ 

(see also Kleinberg's review [3D] and references therein). 
However, most of these works do not study greedy rout- 
ing on scale-free topologies, which are known as the com- 
mon signature of many large-scale self-evolving complex 
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networks [TJ[1I3]. 

We use the class of network models developed in re- 
cent work [13]. They generate networks with topologies 
similar to those of real networks -small-world, scale-free, 
and with strong clustering- and, simultaneously, with 
hidden metric spaces lying underneath. The simplest 
model in this class (the details are in Appendix [A]) uses a 
one-dimensional circle as the underlying metric space, in 
which nodes are uniformly distributed. The model first 
assigns to each node its expected degree fc, drawn from a 
power-law degree distribution P{k) ~ k~'^ , with 7 > 2, 
and then connects each pair of nodes with connection 
probability r(d; fc, k') that depends both on the distance 
d between the two nodes in the circle and their assigned 
degrees k and k' , 

r{d;k,k') = r(d/de) = (l + d/de)"", (1) 
where a > 1 and dc ^ kk' , 

which means that the probability of link connection be- 
tween two nodes in the network decreases with the hid- 
den distance between them (as ~ d^") and increases with 
their degrees (as ^ {kk')"). 

These two properties have a clear interpretation. The 
connection cost increases with hidden distance, thus dis- 
couraging long-range links. However, in making connec- 
tions, rich (well-connected, high-degree) nodes care less 
about distances (connection costs) than poor nodes. Fur- 
ther, the characteristic distance scale dc provides a cou- 
pling between node degrees and hidden distances, and en- 
sures the following three topological characteristics that 
we commonly see in real networks. First, pairs of richly 
connected, high-degree nodes -hubs- are connected with 
high probabihty regardless of the hidden distance be- 
tween them because their characteristic distance dc is 
so large that any actual distance d between them will be 
short in comparison: regardless of d, connection proba- 
bility r in Eq. ([T]) is close to 1 if dc is large. Second, pairs 
of low-degree nodes will not be connected unless the hid- 
den distance d between them is short enough to compare 
with the small value of their characteristic distance dc- 
Third, following similar arguments, pairs composed of 
hubs and low-degree nodes are connected only if they are 
located at moderate hidden distances. 

The parameter a in Eq. ([ij determines the importance 
of hidden distances for node connections. The larger a, 
the more preferred are connections between nodes close 
in the hidden space. Consequently, the triangle inequal- 
ity in the metric space leads to stronger clustering in 
the network, cf. Fig. [l] Clustering has a clear interpre- 
tation in our approach as a reflection of the network's 
metric strength: the more powerful is the influence of 
the network's underlying metric space on the observable 
topology, the more strongly it is clustered. 

Although our toy model is not designed to exactly 
match any specific real network, it generates graphs that 
are surprisingly similar to some real networks, such as 
the Internet at the autonomous system level or the USA 
airport network. See Appendix [P] for details. 
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FIG. 2: Average length of greedy-routing paths. The 

left plot shows the average hop length of successful paths, r, 
as a function of the network size N for different values of 7 
and a. Results for values of 7 > 2.5 look similar but with 
longer paths and are omitted for clarity. In all cases, the 
path length grows polylogarithmically with the network size: 
the observed values of t are fit well by t{N) = A[\ogN]" 
(solid lines), where A and 1/ are some constants. The right 
plot shows r as a function of 7 and a for networks of fixed 
size A'^ « 10^. The effect of the two parameters on average 
path length is straightforward: paths are shorter for smaller 
exponents 7 and stronger clustering (larger q's). 



III. NAVIGABILITY OF MODELLED 
NETWORKS 



We use the model to generate scale-free networks with 
different values of power-law degree distribution expo- 
nent 7 and clustering strength a, covering the observed 
values in a vast majority of documented complex net- 
works [H El 13] • We then simulate greedy routing for a 
large sample of paths on all generated networks, and com- 
pare the following two navigability parameters: 1) the 
average hop length r from source to destination of suc- 
cessful greedy-routing paths, and 2) the success ratio Ps, 
defined as the percentage of successful paths. Unsuc- 
cessful paths are paths that get stuck at nodes without 
neighbours closer to the destination in the hidden space 
than themselves. These nodes usually have small degrees. 
See Appendix [B] for simulation details. 

Fig. |2] shows the impact of the network's degree distri- 
bution and clustering on the average length r of greedy 
routing paths. We observe a straightforward dependency: 
paths are shorter for smaller exponents 7 and stronger 
clustering (larger a's). The dependency of the success 
ratio (the fraction of successful paths) Ps on the two 
topology parameters 7 and a is more intertwined. Fig. [3] 
shows that the effect of one parameter, 7, on the success 
ratio depends on the other parameter, the level of clus- 
tering. If clustering is weak (low a), the percentage of 
successful paths decays with network size N regardless 
of the value of 7 (Fig. 3] top-left). However, with strong 
clustering (large a), the percentage of successful paths 
increases with N and attains a maximum for large net- 
works if 7 < 2.6, whereas it degrades for large networks 
if 7 > 2.6 (Fig. [3] bottom-left). Fig. [s] top-right shows 
this effect for networks of the same size {N = 10^) with 
different 7 and a. The value of7 = 2.6±0.1 maximises 
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FIG. 3: Success probability of greedy routing. Left 
plots: success probability ps as a function of network size A*' 
for different values of 7 with weak (top) and strong (bottom) 
clustering. The top-right plot shows ps as a function of 7 
and a for networks of fixed size A'^ « 10^. In the bottom- 
right plot, parameter a is mapped to clustering coefficient 
C [15] by computing C for each network with given 7 and 
a. For each value of C, there is a critical value of 7 = 7c (C*) 
such that the success ratio in networks with this C and 7 > 
7c(C) decreases with the network size {ps{N) > 0), while 

— 'oo 

Ps{N) reaches a constant value for large A'^ in networks with 
7 < 7c (C). The solid line in the plot shows these critical 
values 7c(C), separating the low-7, high-C navigable region, 
in which greedy routing remains efficient in the large-graph 
limit, from the high-7, low-C non-navigable region, where 
the efficiency of greedy routing degrades for large networks. 
The plot labels measured values of 7 and C for several real 
complex networks. Internet is the global Internet topology of 
autonomous systems as seen by the Border Gateway Protocol 
(BGP) EI]; Web of trust is the Pretty Good Privacy (PGP) 
social network of mutual trust relationships 32 ; Metabolic is 
the network of metabolic reactions of E. coli 33 ; and Airports 
is the network of the public air transportation system |34) . 



the number of successful paths once clustering is above 
a threshold, a > 1.5. These observations mean that for 
a fixed clustering strength, there is a critical value of the 
exponent 7 (Fig. [S] bottom-right) below which networks 
remain navigable as their size increases, but above which 
their navigability deteriorates with their size. 

In summary, strong clustering improves both naviga- 
bility metrics. We also find a delicate trade-off between 
values of 7 close to 2 minimising path lengths, and higher 
values - not exceeding 7 « 2.6 - maximising the per- 
centage of successful paths. We explain these findings 
in the next section, but we note here that qualitatively, 
this navigable parameter region contains a majority of 
complex networks observed in reality [U |2] [3], as con- 
firmed in Fig. [3] (bottom-right), where we juxtapose few 
paradigmatic examples of communication, social, biolog- 
ical, and transportation networks vs. the identified nav- 



igable region of clustering and degree distribution expo- 
nent. Interestingly, power grids, which propagate elec- 
tricity rather than route information, are neither scale- 
free nor clustered [111 [35] . 



IV. AIR TRAVEL BY GREEDY ROUTING AS 
AN EXPLANATION 

We illustrate the greedy routing function, and the 
structure of networks conductive to such routing, with 
an example of passenger air travel. Suppose we want 
to travel from Toksook Bay, Alaska, to Ibiza, Spain, by 
the public air transportation network. Nodes in this net- 
work are airports, and two airports are connected if there 
is at least one flight between them. We travel accord- 
ing to the greedy routing strategy using geography as 
the underlying metric space. At each airport we choose 
the next-hop airport geographically closest to the desti- 
nation. Under these settings, our journey goes first to 
Bethel, then to Anchorage, to Detroit, over the Atlantic 
to Paris, then to Valencia and finally to Ibiza, see Fig. |4] 
The sequence and sizes of airport hops reveal the struc- 
ture of our greedy-routing path. The path proceeds from 
a small airport to a local hub at a small distance, from 
there to a larger hub at a larger distance, and so on un- 
til we reach Paris. At that point, when the distance to 
the destination becomes sufficiently small, greedy routing 
leads us closer to our final destination by choosing not 
another hub, but a less connected neighbouring airport. 

We observe that the navigation process has two, some- 
what symmetric phases. The first phase is a coarse- 
grained search, travelling longer and longer distances per 
hop toward hubs, thus "zooming out" from the starting 
point. The second phase corresponds to a fine-grained 
search, "zooming in" onto the destination. The turning 
point between the two phases appears naturally: once we 
are in a hub near the destination, the probability that it 
is connected to a bigger hub closer to the destination 
sharply decreases, but at this point we do not need hubs 
anyway, and greedy routing directs us to smaller airports 
at shorter distances next to the destination. 

This zoom out/zoom in mechanism works efficiently 
only if the coupling between the airport network topol- 
ogy and the underlying geography satisfies the follow- 
ing two conditions: the sufficient hubs condition and 
the sufficient clustering condition. The first condition 
ensures that a network has enough hub airports (high- 
degree nodes) to provide an increasing sequence during 
the zoom out phase. This condition is fulfilled by the real 
airport network and by other scale-free networks with 
small values of degree distribution exponent 7, because 
the smaller the 7, the larger the proportion of hubs in 
the network. 

However, the presence of many hubs does not ensure 
that greedy routing will use them. Unlike humans, who 
can use their knowledge of airport size to selectively 
travel via hub airports, greedy routing uses only one con- 
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FIG. 4: Greedy routing in the airport network. Top: the structure of the single greedy-routing path from 
Toksook Bay to Ibiza. At each intermediate airport, the next hop is the airport closest to Ibiza geographically. Sizes 
of symbols representing the airports are proportional to the logarithm of their degrees. The bottom left figure shows the 
changing distance to Ibiza (in the x axis) vs. the degree of the visited airports {y axis, in logarithmic scale). Bottom right: 
the structure of greedy-routing paths between a collection of airports in the USA [36J . We include an airport pair 
in the collection if the distance between the airports is between 3900 and 4100 kilometers. The number of airport pairs in this 
collection is 7620. We use colour to indicate how often paths in the collection go through an airport of a given degree located 
at a given geographical distance from the destination: blue/red indicates exponentially less/more visits to those airports, or 
more specifically, the color is the logarithm of the normalised density of visited airports. 



straint at each hop: minimise distance to the destination. 
Therefore, the network topology must satisfy the second 
condition, which ensures that Bethel is larger than Tok- 
sook Bay, Anchorage larger than Bethel, and so on. More 
generally, this condition is that the next greedy hop from 
a remote low-degree node likely has a higher degree, so 
that greedy paths typically head first toward the highly 
connected network core. But the network metric strength 
is exactly the required property: preference for connec- 



tions between nodes nearby in the hidden space means 
that low-degree nodes are less likely to have connectivity 
to distant low-degree nodes; only high-degree nodes can 
have long-range connection that greedy routing will ef- 
fectively select. The stronger this coupling between the 
metric space and topology (the higher a in Eq. ([T])), the 
stronger the clustering in the network. 

To illustrate, imagine an airport network without suf- 
ficient clustering, one where the airport closest to our 



6 



1 

0.8 



- 


a=l. 1,7^2.2- 

— -.^ X ' 

-v N, N, 

N X — 




M.I J^Cbr,.u 



-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 



colour: logarithm of the normalised 
density of visited points 




FIG. 5: Probability that greedy routing travels to 
higher-degree nodes. More precisely, the probability 
Pup{k,d) that the greedy-routing next hop after a node of 
degree k located at distance d from a destination has higher 
degree k' > k and is closer to the destination. The distance 
legend in the right-bottom plot applies to all the plots. The 
results are for the large-graph limit N oo. 



destination (Ibiza) among all airports connected to our 
current node (Toksook Bay, Alaska) is not Bethel, which 
is bigger than Toksook Bay, but Nightmute, Alaska, a 
nearby airport of comparable size to Toksook Bay. As 
greedy routing first leads us to Nightmute, then to an- 
other small nearby airport, and then to another, we can 
no longer get to Ibiza in few hops. Worse, travelling via 
these numerous small airports, we could reach one with 
no connecting flights heading closer to Ibiza. Our greedy 
routing would be stuck at this airport with an unsuccess- 
ful path. 

These factors explain why the most navigable topolo- 
gies correspond to scale-free networks with small expo- 
nents of the degree distribution, i.e., a large number of 
hubs, and with strong clustering, i.e., strong coupling be- 
tween the hidden geometry and the observed topology. 



distance to destination 



FIG. 6: The structure of greedy-routing paths. We 

visualise the results of our simulation of greedy routing in 
modelled networks with different values of 7 and a observed 
in real complex networks. The hidden distance between the 
starting point and the destination is always approximately 
10^, and the network size A'^ and number of attempted paths 
is always lO"" for each (7, a) combination, but the number of 
successful paths and path hop-lengths vary, cf. Figs. |2|3| All 
paths start and end at low-degree nodes located, respectively, 
in the left- and right-bottom corners of the diagrams (see top 
left plot). For each (7, a) we depict a single typical path in 
black and, as in Fig. [4] use colour to indicate how often paths 
included a node of a given degree located at a given distance 
from the destination. The simulations confirm that only when 
7 is small and a is large does the average path structure follow 
the zoom-out/zoom-in pattern that characterises successful 
greedy routing in real networks, e.g., in the airport network 
in Fig.g 



V. THE STRUCTURE OF GREEDY-ROUTING 
PATHS 

We observe the discussed zoom-out /zoom- in mecha- 
nism in analytical calculations and numerical simula- 
tions. Specifically, we calculate (in Appendix If]) the 



probability that the next hop from a node of degree k 
located at hidden distance d from the destination has a 
larger degree k' > k, in which case the path moves toward 
the high-degree network core, see Fig. |5] In the most 
navigable case, with small degree-distribution exponent 
and strong clustering, the probability of increasing the 
node degree along the path is high at low-degree nodes. 



7 



and sharply decreases to zero after reaching a node of 
a critical degree value, which increases with distance d. 
This observation implies that greedy-routing paths first 
propagate up to higher-degree nodes in the network core 
and then exit the core toward low-degree destinations in 
the periphery. In contrast, with low clustering, paths are 
less likely to find higher-degree nodes regardless of the 
distance to the destination. This path structure violates 
the zooni-out/zoom-in pattern required for efficient nav- 
igation. 

Fig. |6] shows the structure of greedy-routing paths 
in simulations, further confirming our analysis. We 
again see that for small degree-distribution exponents 
and strong clustering (upper left and middle left), the 
routing process quickly finds a way to the high-degree 
core, makes a few hops there, and then descends to a low- 
degree destination. In the other, non-navigable cases, the 
process can almost never get to the core of high-degree 
nodes. Instead, it wanders in the low-degree periphery 
increasing the probability of getting lost at low-degree 
nodes. 



VI. DISCUSSION 

Our main motivation for this work comes from long- 
standing scalability problems with the Internet routing 
architecture [37^ . To route information packets to a given 
destination, Internet routers must communicate to main- 
tain a coherent view of the global Internet topology. The 
constantly increasing size and dynamics of the Internet 
thus leads to immense and quickly growing communica- 
tion and information processing overhead, a major bot- 
tleneck in routing scalability [33 causing concerns among 
Internet experts that the existing Internet routing archi- 
tecture may not sustain even another decade |37j . Dis- 
covery of the Internet's hidden metric space would re- 
move this bottleneck, eliminating the need for the in- 
herently unscalable communication of topology changes. 
Instead routers would be able to just forward packets 
greedily to the destination based on hidden distances. 

In a similar manner, reconstruction of hidden metric 
spaces underlying other real networks may prove prac- 
tically useful. For example, in social or communication 
networks (e.g., the Web, overlay, or online social net- 
works) hidden spaces would yield efficient strategies for 
searching specific individuals or content based only on 
local knowledge. The metric spaces hidden under some 
biological networks (such as neural, gene regulatory net- 
works, signalling or even protein folding [39^ pathways) 
can become a powerful tool in studying the structure of 
information or signal flows in these networks, enabling 
investigation of such processes without detailed global 
knowledge of the network structure or organisation. 

The natural question we thus face is how to proceed to- 
ward discovery of the explicit structure of hidden metric 
spaces underlying real networks. We do not expect spaces 
underlying different networks to be exactly the same. 



For example, the similarity spaces of Web pages [S] and 
Wikipedia editors [llj likely differ. However, the main 
contribution of this work establishes the general mecha- 
nisms behind navigability of scale-free, strongly clustered 
topologies that characterise many different real networks. 
The next step is to find the common properties of hid- 
den spaces that render them congruent with these mech- 
anisms. Specifically, we are interested in what geometries 
of hidden spaces lead to such congruency |1D] . 

In general, we believe that the present and future work 
on hidden metric spaces and network navigability will 
deepen our understanding of the fundamental laws de- 
scribing relationships between structure and function of 
complex networks. 
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APPENDIX A: A MODEL WITH THE CIRCLE 
AS A HIDDEN METRIC SPACE. 

In our model we place all nodes on a circle by assign- 
ing them a random variable 9, i.e., their polar angle, dis- 
tributed uniformly in [0, 27r). The circle radius R grows 
linearly with the total number of nodes N, 2'kR = TV, 
in order to keep the average density of nodes on the cir- 
cle fixed to 1. We next assign to each node its expected 
degree k drawn from some distribution p(/«). The con- 
nection probability between two nodes with hidden co- 
ordinates (0, n) and (6*', k') takes the form 

^<-^«'-')^(-^)"^ -^■<-) 

where d{0, 6') is the geodesic distance between the two 
node on the circle, while (fc) is the average degree. One 
can show that the average degree of nodes with hidden 
variable k, k{ti), is proportional to K.|41j This pro- 
portionality guarantees that the shape of the node de- 
gree distribution P{k) in generated networks is approx- 
imately the same as the shape of p{k). The choice of 

p{k) = (7 - 1)«^^-^ K> Ko = {^- 2){k)/{j - 1), 

7 > 2, generates random networks with a power-law de- 
gree distribution of the form P{k) ~ k^'^ , where 7 is 
a model parameter that regulates the heterogeneity of 
the degree distribution in the network. This parame- 
ter abstracts the heterogeneity of node degrees in real 
networks, where degree distributions may not perfectly 
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follow power laws, or may exhibit various forms of high- 
degree cut-offs [311 142] . The specific effects are less im- 
portant than the overall measure of heterogenity. We 
note that instead of a circle in our model we could use 
any isotropic space of any dimension ^13j . 



APPENDIX B: NUMERICAL SIMULATIONS. 

Our model has three independent parameters: ex- 
ponent 7 of power-law degree distributions, clustering 
strength a, and average degree (k) . We fix the latter to 
6, which is roughly equal to the average degree of some 
real networks of interest [3TJ [35], and vary 7 g [2.1,3] 
and a G [1.1,5], covering their observed ranges in docu- 
mented complex networks [Il[2|3]. For each (7,0) pair, 
we produce networks of different sizes N G [10'^, 10^] gen- 
erating, for each (7, a, N), a number of different network 
instances — from 40 for large N to 4000 for small N. In 
each network instance G, we randomly select 10^ source- 
destination pairs (a, b) and execute the greedy-routing 
process for them starting at a and selecting, at each hop 
h, the next hop as the h's neighbour in G closest to b 
in the circle. If for a given (a, &), this process visits the 
same node twice, then the corresponding path leads to a 
loop and is unsuccessful. We then average the measured 
values of path hop lengths r and percentage of successful 
paths Ps across all pairs (a, 6) and networks G for the 
same (7, a, N). Note that we are not concerned with the 
absolute values of the success ratio Ps ■ Instead we use it 
as a measure of navigability to compare networks with 
different (7,a,Af). For this purpose we could use the 
success ratio of any (improved) modification of standard 
greedy routing. 



APPENDIX C: SHORTEST PATH VS. SHORTEST 
TIME. 

All results derived in the present paper are about find- 
ing short paths across a network topology. The total 
physical time from source to destination is implicitly as- 
sumed to be proportional to the number of hops. In 
real transportation systems, e.g. the Internet or the air- 
port network, the finite capacity of nodes implies that 
the end-to-end path latency may be longer when inter- 
mediate nodes are congested. While our results most 
cleanly apply to uncongested systems, there are obvious 
modifications, such as choosing the second or third near- 
est rather than the nearest neighbor, that could still find 
nearly shortest paths while reducing and balancing load 
on the system. 



APPENDIX D: THE MODEL VS. REAL 
NETWORKS: THE AUTONOMOUS SYSTEM 
LEVEL MAP OF THE INTERNET AND THE US 
AIRPORT NETWORK 



The model we use in this work is not meant to repro- 
duce any particular system but to generate a set of gen- 
eral properties, like heterogeneous degree distributions, 
high clustering, and a metric structure lying underneath. 
Yet, despite its simplistic assumptions, the model gener- 
ates graphs that are surprisingly close to some real net- 
works of interest, in particular the Internet at the Au- 
tonomous System level (AS) [3TJ [35] and the network 
of airline connections among airports within the United 
States during 2006 (USAN) [36^. In the case of the In- 
ternet, we use two different data sets, the Internet as 
viewed by the Border Gateway Protocol (BGP) [SI] and 
the DIMES project [43^. The BGP (DIMES) network has 
a size of 17446 (A^ = 19499) ASs, average degree 

(k) — 4.7 ((fc) = 5), and average clustering G = 0.41 
(C = 0.6). The US Airport Network is composed of 
US airports connected by regular flights (with more than 
1000 passengers per year) during the year 2006. This re- 
sults in a network of A^ = 599 airports, average degree 
(k) ~ 10.8 and average clustering coefficient C = 0.72. 

Figs. [7] and [8] show a comparison of the basic topolog- 
ical properties of these networks with graphs generated 
with the model. In the case of the AS map, we use a 
truncated power law distribution p(k) ~ k < Kc 

with exponent 7 = 2.1 and Kc such that the maximum 
degree of the network is — 2400. For the USAN, we 
use 7 — 1.6 and a maximum degree kc — 180, as observed 
in the real network. As it can be appreciated in both fig- 
ures, the matching of the model with the empirical data 
is surprisingly good except for very low degree vertices. 
This is particularly interesting since we are not enforc- 
ing any mechanism to reproduce higher order statistics 
like the average nearest neighbours degree fc„„(fc) or the 
degree-dependent clustering coefficient c{k). This can be 
understood as a consequence of the high heterogeneity 
of the degree distribution that introduces structural con- 
straints in the network [UJ US] . 

The airport network differs in several ways from our 
modelled networks: the distribution of airports in the 
geographic space is far from uniform; the airport degree 
distribution does not perfectly follow a power law; and it 
exhibits a sharp high-degree cut-off. However, the struc- 
ture of greedy paths is surprisingly similar to that in our 
modelled networks in Fig. [6] The success ratio Ps ~ 0.64 
and average length of successful paths r « 2.1 are also 
similar to those in our modelled networks of the corre- 
sponding size, clustering, and degree distribution expo- 
nent. These similarities indicate that the network navi- 
gability characteristics depend on clustering and hetero- 
geneity of the airport degree distribution, and less so on 
how perfectly it follows a power law. 



^AS DIMES 
M AS BGP 
•-•Model Y=2.1 




^AS DIMES 
MAS BGP 
Model 





^USAN 

Model Y=1.6 




^USAN 
•-• Model 




FIG. 7: Degree distribution P{k), average nearest neigh- 
bours' degree knnik), and degree-dependent clustering coef- 
ficient c{k) generated by our model with 7 = 2.1 and a — 2 
compared to the same metrics for the real Internet map as 
seen by BGP data and the DIMES project. 



FIG. 8: Degree distribution P{k), average nearest neigh- 
bours' degree knnik), and degree-dependent clustering coeffi- 
cient c(fc) generated by our model with 7 = 1.6, q = 5 and a 
cut-off at fee = 180 compared to the same metrics for the real 
US airport network. 



APPENDIX E: HIERARCHICAL 
ORGANIZATION OF MODELED NETWORKS 

The routing process in our framework resembles guided 
searching for a specific object in a complex collection 
of objects. Perhaps the simplest and most general way 
to make a complex collection of heterogenous objects 
searchable is to classify them in a hierarchical fashion. 
By "hierarchical," we mean that the whole collection is 
split into categories (i.e., sets), sub-categories, sub-sub- 
categories, and so on. Relationships between categories 
form (almost) a tree, whose leaves are individual objects 
in the collection [7l [121 140] . Finding an object reduces 
to the simpler task of navigating this tree. 

fc-core decomposition [ITJ |35] is possibly the most suit- 
able generic tool to expose hierarchy within our modeled 
networks. The fc-core of a network is its maximal sub- 
graph such that all the nodes in the subgraph have k 
or more connections to other nodes in the subgraph. A 
node's coreness is the maximum k such that the fc-core 



contains the node but the fc-t-l-core does not. The fc-core 
structure of a network is a form of hierarchy since a fc-l- 1- 
core is a subset of a fc-core. One can estimate the quality 
of this hierarchy using properties of the fc-core spectrum, 
i.e., the distribution of fc-core sizes. If the maximum 
node coreness is large and if there is a rich collection of 
comparably-sized fc-cores with a wide spectrum of fc's, 
then this hierarchy is deep and well-developed, making 
it potentially more navigable. It is poor, non-navigable 
otherwise. 

In Fig. [9] we feed real and modeled networks to the 
Large Network visualization tool (LaNet-vi) [iB] which 
utilizes node coreness to visualize the network. Fig. [9] 
shows that networks with stronger clustering and smaller 
exponents of degree distribution possess stronger fc-core 
hierarchies. These hierarchies are directly related to how 
networks are constructed in our model, since nodes with 
higher k and, consequently, higher degrees have generally 
higher coreness, as we can partially see in Fig. [9] 
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FIG. 9: fc-core decompositions of real and modeled networks. The first two rows show LaNet-vi [46] network visual- 
izations. All nodes are color-coded based on their coreness (right legends) and size-coded based on their degrees (left legends) . 
Higher-coreness nodes are closer to circle centers. The third row shows the fc-core spectrum, i.e., the distribution S{k) of sizes 
of node sets with coreness k. The first column depicts two real networks: the AS-level Internet as seen by the Border Gateway 
Protocol (BGP) in [31] and the Pretty Good Privacy (PGP) social network from [32]. The rest of the columns show modeled 
networks for different values of power-law exponent 7 in cases with weak (a = 1.1) and strong (a — 5.0) clustering. The 
network size N for all real and modeled cases is approximately 10*. Similarity between real networks and modeled networks 
with low 7 and high a is remarkable. 



APPENDIX F: THE ONE-HOP PROPAGATOR 
OF GREEDY ROUTING 

To derive the greedy-routing propagator in this ap- 
pendix, we adopt a slightly more general formalism than 
in the main text. Specifically, we assume that nodes 
live in a generic metric space Ti. and, at the same time, 
have intrinsic attributes unrelated to Ti.. Contrary to 
normed spaces or Riemannian manifolds, generic metric 
spaces do not admit any coordinates, but we still use 
the coordinate-based notations here to simplify the ex- 
position below, and denote by x nodes' coordinates in H 
and by tu all their other, non-geometric attributes, such 
as their expected degree k. In other words, hidden vari- 
ables X and u! in this general formalism represent some 
collections of nodes' geometric and non-geometric hidden 
attributes, not just a pair of scalar quantities. Therefore, 
integrations over x and u in what follows stand merely 
to denote an appropriate form of summation in each con- 
crete case. 

As in the main text, we assume that x and lu are inde- 
pendent random variables so that the probability density 



to find a node with hidden variables (x, lu) is 

p{^,uj)^6{^)p{uj)/N, (Fl) 

where p{uj) is the probability density of the uj variables 
and (5(x) is the concentration of nodes in Ti. The total 
number of nodes is 

N = [ 5(x)dx, (F2) 
Jn 

and the connection probability between two nodes is an 
integrable decreasing function of the hidden distance be- 
tween them, 

r(x,tj;x', w') = r[d(x,x')/dc(a;, w')], (F3) 

where dc{uj,uj') a characteristic distance scale that de- 
pends on Lo and w'. 

We define the one-step propagator of greedy routing as 
the probabihty G'(x', a;'|x, w; Xt) that the next hop after 
a node with hidden variables (x, w) is a node with hid- 
den variables (x',a;'), given that the final destination is 
located at x^. 
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To further simplify the notations below, we label the 
set of variables (x, lu) as a generic hidden variable h and 
undo this notation change at the end of the calculations 
according to the following rules: 



(x,w) 
P(x,w) 
dxdul 



h 

p{h) 
dh 

r{h,h'). 



(F4) 



We begin the propagator derivation assuming that a 



particular network instance has a configuration given 
by {h,ht,hi,- ■ ■ ,hN-2} = {h,ht;{hj}} with j = 
1, • • • , — 2, where h and hf denote the hidden vari- 
ables of the current hop and the destination, respectively. 
In this particular network configuration, the probability 
that the current node's next hop is a particular node i 
with hidden variable hi is the probability that the cur- 
rent node is connected to i but disconnected to all nodes 
that are closer to the destination than i, 



N-2 



Vr:oh{i\h,ht;{hj}) = r{h,hi) [l — r{h,hj 



.&[d(hi,ht)-d{hj,ht)] 



(F5) 



where O(-) is the Heaviside step function. Tak- In the case of sparse networks, fc(ft,|ft', /ij) is a finite quan- 

ing the average over all possible configurations tity. Taking the limit of large iV, the above expression 

{hi, ■ ■ ■ , /ij+i, • • • , /ijv-2} excluding node i, we ob- simplifies to 
tain 



Proh{i\h, ht; hi) = r{h, hi) ( 1 

where 

k{h\hi,ht) = (iV- 3) 



1 



N-3 



k{h\hi,ht) 



N-3 



d(hi,ht)<d{h',ht) 



(F6) 

p{h')r{h, h')dh' 
(F7) 

is the average number of connections between the current 
node and nodes closer to the destination than node i, 
excluding i and t. 

The probability that the next hop has hidden variable 
h' , regardless of its label, i.e., index i, is 

Af-2 



Prob(/i'|/i,/it) = N p{h')r{h,h')e-'^^^\^' ^^*\ (F9) 

Yet, this equation is not a properly normalized probabil- 
ity density function for the variable h' since node h can 
have degree zero with some probability. If we consider 
only nodes with degrees greater than zero, then the nor- 
malization factor is given by 1 — e^'^^'^h Therefore, the 
properly normalized propagator is finally 



G{h'\h,ht) = 



Np{h')r{h,h')e-~^^^\'''^^^"> 
1 _ (.-k(h) 



(FIO) 



We now undo the notation change and express this 
Prob(/i'|ft., ht) = p{h')Y'voh{i\h, ht, h'). (F8) propagator in terms of our mixed coordinates: 



G(x ,0; |x,w,xt) - ^— — _yr 



d(x, xQ 



-/c(x,aj|x',X£) 



with 



A;(x,a;|x',Xt) = / 



dy 



d(x',xt)>(i(y,xt) 



j dco'd{y)p{oj')7 



d{x,y) 
dc{w,oj') 



(Fll) 



(F12) 



r 



In the particular case of the S"'^ model, we can express 
this propagator in terms of relative hidden distances in- 
stead of absolute coordinates. Namely, G{d' ,co'\d,Lij) is 
the probability that an w-labeled node, e.g., a node with 



expected degree n = lu, at hidden distance d from the 
destination has as the next hop an w'-labeled node at 
hidden distance d' from the destination. After tedious 
calculations, the resulting expression reads: 
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G(d' ,u)'\d,u)) 



(7-1) 



(7-1) 



1 

d-d' 



,,p|(l^ [B(^-7-2,2-a)-B(^^,7-2,2-a)]} ; d' < d 



cxp 



J" (1-7)a" 
\ a-1 



2 

7-2 



■e(^.T-2,2-a)-B(^^,7-2,2-a)]} ; d' > d 
£13) 



1 

0.8 
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FIG. 10: Probability P„p(a;/dl/^d). 



B{z, a,b) = z 



(F14) 



which is somewhat similar to the incomplete beta func- 
tion B{z, a, h) = Q r-i(l - tf-^dt. 

One of the informative quantities elucidating the struc- 
ture of greedy-routing paths is the probability P„p(w, d) 
that the next hop after an w-labeled node at distance d 
from the destination has a higher value oiio. The greedy- 
routing propagator defines this probability as 



Pup{uj,d) = 



du' / dd'G{d',uj'\d,uj), (F15) 



d'<d 
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We see that 



and we show Pup{io / d ' , d) in Fig. 
the proper scaling of uJc ~ d^^^, where uj^ is the critical 
value of oj above which Pup{io,d) quickly drops to zero, 
is present only when clustering is strong. Furthermore, 
Pup{ijJ, d) is an increasing function of uj for small w's only 
when the degree distribution exponent 7 is close to 2. 
A combination of these two effects guarantees that the 
layout of greedy routes properly adapts to increasing dis- 
tances or graph sizes, thus making networks with strong 
clustering and 7's greater than but close to 2 navigable. 
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